Avaliação da qualidade de agrupamentos em grafos
نویسنده
چکیده
The process of discovering groups of similar, connected vertices in a graph, known as graph clustering, has interesting applications in several scenarios, such as biology, marketing and recommendation systems. A major challenge concerning this problem is the evaluation of cluster quality, which is used to measure the e ectiveness of clustering algorithms. Many quality metrics for graph cluster evaluation exist, but there is no consensus on which ones are best suited for this task, and most authors in the literature just assume that a chosen metric is good enough, with little or no interest in evaluating the strength of such claims. To better understand the e ectiveness of the most popular cluster quality metrics presented in the literature, we studied them in di erent scenarios. We discovered that they present strong biases and structural inconsistencies that cause the quality of their results to be, at least, doubtful. Our studies demonstrated that, while in general those popular quality metrics do a good job evaluating the external sparsity between clusters, they do poorly when evaluating their internal density, ignoring essential information, such as the cluster's vertex count, or having its internal density ignored in practice because of computational costs. With that in mind, we proposed a new method for evaluating the internal density of a given cluster, one that not only uses more complete information to evaluate that density, but also takes into consideration structural characteristics of the original graph. With this proposed method, the internal density of a cluster is evaluated in terms of the expected density of similar clusters in that same graph. That is in contrast to the traditional quality metrics available, where clusters from di erent graphs are compared by the same standards, a behavior that penalizes naturally sparser graphs. Then, we proposed a new quality metric for graph clusters, combining our metric for internal quality evaluation and Conductance, a popularly used metric for external sparsity evaluation. This way, the proposed metric evaluates the two main structural characteristics expected from well formed clusters. Our experiments showed that the proposed metric is capable of correctly penalizing badly formed clusters that were
منابع مشابه
Geração de um Perfil de Qualidade Para Fontes de Dados Dinâmicas
Nowadays, a massive volume of data has been produced by a variety of data sources. The easy access to these data presents new opportunities. In this sense, choosing the most suitable data sources for a specific use has become a challenge. The literature contains many works that perform quality assessment in data sources as a mean of solving this issue. However, only few works take into account ...
متن کاملImpacto da amostragem aleatória uniforme para o aumento da escalabilidade na geração de agrupamentos hierárquicos de séries espaço-temporais
This paper presents the results of a scalable approach to build hierarchical clustering from space-time series. The goal is to reduce the complexity in terms of space and time. The approach explores data sampling pre-processing techniques to reduce the numerosity of the data. The experiment indicates it is needed the development of more efficient strategies than the naive selection of samples (...
متن کاملMAIS - Avaliação de Investimentos em SI/TI na Administração Pública
xContribuir para aumentar a qualidade de gestão, dentro do espírito consignado nas normas internacionais no domínio da Qualidade n o j p q r s t i O presente documento tem como objectivo dar a conhecer a Metodologia de Avaliação de InvestimentoS (MAIS), desen-volvida pelo Instituto de Informática, a implementar no âm-bito do Processo de Avaliação dos Investimentos em Siste-mas e Tecnologia de I...
متن کاملAvaliando com usuários um método de representação, extração e mensuração de interações sociais
RESUMO Uma tarefa explorada nas pesquisas com mídias sociais está relacionada a atribuição de significado para as interações que ocorrem entre esses usuários dentro desses sistemas. Essa tarefa é geralmente realizada a partir do cálculo e da interpretação de medições estatísticas e/ou baseadas em grafos. Em tal cenário existe a busca por um modelo descritivo de como as interações sociais estão ...
متن کاملHeurísticas para avaliar a usabilidade de aplicações móveis: estudo de caso para aulas de campo em Geologia
To evaluate the usability of mobile applications is necessary to consider the peculiarities of these devices, such as mobility, hardware constraints and context of use. In order to improve the usability evaluations of these applications, existing techniques are adapted so that these characteristics are taken into account. This paper presents the results of a study conducted to identify works th...
متن کاملMetrics Development for the Qualis of Software Technical Production.
OBJECTIVE To recommend metrics to qualify software production and to propose guidelines for the CAPES quadrennial evaluation of the Post-Graduation Programs of Medicine III about this issue. METHOD Identification of the development process quality features, of the product attributes and of the software use, determined by Brazilian Association of Technical Standards (ABNT), International Organ...
متن کامل